Lemmatization of Multi-word Common Noun Phrases and Named Entities in Polish
نویسنده
چکیده
In the paper we present a tool for lemmatization of multi-word common noun phrases and named entities for Polish called PoLem1. The tool is based on a set of manually crafted rules and heuristics utilizing a set of dictionaries (including morphological, named entities and inflection patterns). The accuracy of lemmatization obtained by the tool reached 97.99% on a dataset with multi-word common noun phrases and 86.17% for casesensitive evaluation on a dataset with named entities.
منابع مشابه
UNED at ImageCLEF 2004: Detecting Named Entities and Noun Phrases for Automatic Query Expansion and Structuring
This paper describes UNED experiments at the Image CLEF bilingual ad hoc task. Two different strategies are attempted: i) automatic expansion and translation using noun phrases; ii) automatic detection of named entities in the query for structured search on image caption fields. All our experiments obtain results above the average MAP for the bilingual task. Structured searches using named enti...
متن کاملInvestigating Embedded Question Reuse in Question Answering
The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...
متن کاملCoreference Resolution of Named Entities and Noun Phrases in Web Pages
An approach for intra-document coreference resolution of named entities and noun phrases is proposed. This approach is a knowledgepoor, integrated approach to coreference resolution which relies on syntactic, discourse and semantic information (using WordNet). Our approach is also intended to exploit the structural features of web pages for the purposes of discourse analysis. This research is i...
متن کاملNaming clusters in visualization studies: Parsing and filtering of noun phrases from citation contexts
The present study presents a semi-automatic method for parsing and filtering of noun phrases from citation contexts. The purpose of the method is to extract contextual, agreed upon, and pertinent noun phrases, to be used in visualization studies for naming clusters (concept groups) or concept symbols. The method is applied in a case study, which forms part of a larger dissertation work concerni...
متن کاملRussian Named Entities Recognition and Classification Using Distributed Word and Phrase Representations
The paper presents results on Russian named entities classification and equivalent named entities retrieval using word and phrase representations. It is shown that a word or an expression’s context vector is an efficient feature to be used for predicting the type of a named entity. Distributed word representations are now claimed (and on a reasonable basis) to be one of the most promising distr...
متن کامل